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Abstract: An important feature of linear mixed models and generalized 
linear mixed models is that the conditional mean of the response given 
the random effects, after transformed by a link function, is linearly re- 
lated to the fixed covariate effects and random effects. Therefore, it is of 
practical importance to test the adequacy of this assumption, particularly 
the assumption of linear covariate effects. In this paper, we review pro- 
cedures that can be used for testing polynomial covariate effects in these 
popular models. Specifically, four types of hypothesis testing approaches 
are reviewed, i.e. R tests, likelihood ratio tests, score tests and residual- 
based tests. Derivation and performance of each testing procedure will be 
discussed, including a small simulation study for comparing the likelihood 
ratio tests with the score tests. 
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1. Introduction 

Linear mixed models (LMMs) [16] and their extension, generalized linear mixed 
models (GLMMs) [2; 28] are popular statistical models for analyzing correlated 
data, including longitudinal and clustered data often arising in biomedical re- 
search. An important feature of these models is that the conditional mean of 
the response given covariates and random effects, after transformed by a link 
function, is linearly related to the fixed covariate effects and random effects. 
The correctness of such model specification, especially the one on parametric 
linear covariate effects, has a significant impact on the validity of the subse- 
quent statistical inference on the covariate effects. Therefore, it is of practical 
importance to check the adequacy of the assumption for the parametric linear 
covariate effects. 

In order to evaluate the adequacy of a parametric covariate effect in a re- 
gression model, one common approach is to cast the problem in the hypothesis 
testing framework, where a broader class of models is selected as the alternatives. 
Nonparametric regression models, due to their flexibility and robustness in mod- 
eling the relationship between a response variable and explanatory variables, are 
often chosen as such alternatives. In practice, however, one rarely directly uses 
pure nonparametric regression models as alternatives because of the intrinsic in- 
finite dimensional problem of nonparametric functions. To overcome such diffi- 
culties, various smoothing techniques, such as kernel smoothing and (penalized) 
spline smoothing, are often applied to estimate nonparametric functions, and 
the resulting estimates are then used as the alternatives for testing the adequacy 
of the parametric covariate effects. In doing so, the infinite dimensional alter- 
natives are reduced to the ones with finite dimensions (or even one dimension 
in some special cases), which significantly simplifies the testing problems. For 
example, it is well-known that a nonparametric function estimated via penal- 
ized splines or smoothing splines has a mixed effects representation [3; 29; 30]. 
An appealing feature of using the mixed effects representation is that one can 
cast the hypothesis test of parametric against nonparametric covariate effects 
as a variance component test, which in most cases is a simple one-dimensional 
testing problem [30; 8]. The likelihood ratio and the score testing approaches 
reviewed here are mainly based on this mixed effects representation. 

Alternatively, testing the adequacy of parametric covariate effects in LMMs 
and GLMMs can also be viewed as a goodness-of-fit problem. The residual based 
tests proposed by Pan and Lin [22] take this view. Specifically, these tests are 
"based on the cumulative sums of residuals over covariates or predicted values 
of the response variable" [22]. The major advantage of this approach is that it 
is valid against any alternatives that deviate from an assumed model. 

For checking the adequacy of parametric covariate effects, we present here 
an overview on four types of hypothesis testing approaches that receive sig- 
nificant attention in the literature: R tests, likelihood ratio tests, score tests 
and residual-based tests. For each test, the derivation and performance are de- 
scribed first in the linear or generalized linear model framework, and then we 
mainly focus on their extensions to mixed models. The paper is organized as 
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follows. Section 2 briefly introduces the models to be considered in this review. 
In Section 3, we review the four testing procedures. In Section 4, we present 
the results from a small simulation study to compare the performance of two 
popular testing procedures, the exact likelihood ratio test and the score test, 
based on mixed effects representation of (penalized) smoothing spline estimates 
of a nonparametric function. The paper is concluded in Section 5 with some 
discussion. 

2. Generalized linear mixed models 

In this section, we briefly introduce the models to be considered and notations 
to be used in this review. Since LMMs are special cases of GLMMs, we will only 
introduce GLMMs for longitudinal/clustered data. Suppose there are m subjects 
(or clusters) in a data set. For the ith subject (i = 1,2,..., to), denote by y^ 
the jth measurement of the response variable (j = 1, 2, . . . , n^), and by Zy, Sjj 
and Uj the jth measurements of the q-dimcnsional covariates z, p-dimensional 
covariates s (not including the intercept) and a scalar covariate t. Given subject- 
specific random effects bi and these covariate values, is assumed to be inde- 
pendent and has a conditional density in an exponential family with conditional 
mean /iy = E(yij\bi) and conditional variance var(j/y|6j) = u>^ (pv(Hij), where 
u>ij is a prior weight, (f> is the dispersion parameter and v(-) is the variance func- 
tion. The conditional mean fj,ij is assumed to be related to the covariates in the 
following GLMM [2] 

g(Vij) = sfjS + m(ty,7) + zfjbi, (2.1) 

where g(-) is a known monotone link function, <5 are fixed effects of s, m(t, 7) = 
70 + 71 i + • • • + jdt d is the c?-order (d is a non-negative integer) polynomial 
covariate effect of t with coefficients 7ft 's, and the random effects bi are usually 
assumed to have a multivariate normal distribution N{0, D(9)} with 9 being the 
vector of unique parameters in the variance matrix of the random effects 

Model (2.1) includes many popular models as special cases. When g(fj,) = fi 
and yij is assumed to have a conditional normal distribution given random 
effects bi, the model (2.1) reduces to an LMM considered by Laird and Ware 
[16]. Suppose we are confident about the parametric linear form sfjS in model 
(2.1) and are mainly concerned with the adequacy of m(t,"f), the polynomial 
covariate effect of t. For this purpose, we consider the following scmiparametric 
additive mixed models (SAMMs) proposed by Zhang and Lin [30] as alternative 
models to model (2.1) 

g(l Hj ) = sJ j 5 + f(t ij ) + z? j b i , (2.2) 

where f(t) is a smooth but arbitrary function. 

Denote y = (y n , ■ ■ ■ , y\n x , ■ ■ ■ , y m i, ■ ■ ■ , y m n m ) T , S = («u, . . . , s Ull , . . . , 
Smi, ■ ■ • , s«J T , b = (61, . . . , b m ) T , Zi = (zfi, zf n .) T , Z = diag{Zi, . . . , Z m }, 
and ji = ~E(y\b). In the next section, we discuss four procedures for checking the 
assumption that f(t) is adequately represented by a polynomial function m(t, 7). 
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3. Four testing procedures 
3.1. R tests 

The R tests, discussed by Hastie and Tibshirani [15], were originally developed 
for testing smoothing parameters during the estimation of nonparamctric func- 
tions through smoothing techniques for independent data. The idea of the R 
tests is analogous to the F statistic frequently used in linear regression models. 
One of the advantages of the R tests is their easy implementation, as under 
the null hypothesis the asymptotic distribution of the R statistic can be ap- 
proximated by the chi-square distribution. However, the estimates of the de- 
grees of freedom of chi-square distributions can be biased, and the resulting 
approximated critical values might be inaccurate. Moreover, the finite-sample 
distribution of the R statistic has not been studied [8]. 

A number of modifications on the original R tests have been made, including 
the correction of the bias of nonparamctric estimates, reconstruction of the 
original test statistics and the corresponding distributions [1; 4; 8]. Here we 
briefly describe a version of R statistics proposed by Hardle et al. [13] under 
the generalized linear model (GLM) framework. They considered the following 
generalized partially linear model, a special case of SAMMs (2.2) for independent 
data (rii = 1): 

g( fH ) = sj6 + f(t i ). (3.1) 

Here, no random effect is required as y^s are independent, so the second sub- 
script j (J = 1) can be dropped for the simplicity of the notation. 

Denote by 8 and / the estimates of 8 and f(t) under the null parametric model 
Hq : f(t) = m(i, 7), and by 8 and / the estimates under the alternative model 
H a : f(t) ? m(t,7). Let £ = g- 1 {sf~8 + f(t l )} and ft = g- l {sj 8 + /<&)}. The 
proposed R statistic for testing Hq : f(t) — m(t;j) versus H a : f(t) ^ m(t;7), 
is defined as 

m 

R = -2j2Q(fii;Pi), (3-2) 
t=i 

where Q is the log quasi-likelihood function defined as Q(nf, yi) = J^' vfo du. 
Note that here the non-parametric estimates are based on kernel smoothing 
methods instead of spline methods as discussed below. As Hardle et al. [13] 
pointed out, the usual likelihood ratio statistic £(/, 8) — C(f, 8), where £(/, 8) = 
SI=i Qi^i'tVi)-: is n °t appropriate in this case as 8 and f(t) are estimated from 
two different likelihood functions. Under the null hypothesis, Hardle et al. [ ] 
showed that the new R statistic has an asymptotic normal distribution, although 
such approximation typically does not work well. Hence Hardle et al. [13] pro- 
posed several sophisticated bootstrap-based approaches to obtain more accurate 
critical values for the R tests. 

Sperlich and Lombardia [21] extended the above R statistic to test Hq : 
f(t) = m(t;7) for a special SAMM with a random intercept only (i.e., Zij = 1). 
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The test statistic they proposed takes the following form: 

R ^ =EE i? ^fe)' ? H/fe) - f(tij) + sT s (S- 5)} 2 7r(t y ), (3.3) 
i=i i=i 

where 7r(.) is a weight function which could be chosen empirically and 

H{f(t ij ),S} = ^jl(y ij ;f,S) 2 , 

with l(yij]f,5) = logf(yij\t,s,f,S), the log density of yij. The R\ w statistic 
is based on "direct comparison" between estimates from nonparametric alter- 
natives and estimates from null parametric models. Furthermore, Sperlich and 
Lombardia [21] showed that the theory of the asymptotic normal distribution 
from Hardle et al. [13] can be carried over to the test statistic R\ w . However, 
the asymptotic approximations often depart from the real finite sample distribu- 
tions of the test statistics, which can lead to poor estimates of the critical values. 
Therefore, a number of bootstrap procedures were suggested to approximate the 
null distribution of the test statistic Ri w . 

It can be immediately seen that construction of the R test statistic and its 
extension R\ w for SAMMs involves the estimation of both the null and alter- 
native models. Estimation of the null model may be relatively straightforward, 
however the model estimation under alternatives can be computationally inten- 
sive and sometimes challenging. The bootstrap procedure used to calculate the 
null distribution of the test statistics also requires significant computation time, 
which may limit the application scope of this testing approach. 

3.2. Likelihood ratio tests 

For testing a parametric versus nonparametric covariate effect, the likelihood 
ratio test (LRT) is a natural choice. The LRT has been popular in situations 
where we need to compare two nested models. However, extending the LRT 
to testing the adequacy of a parametric covariate effect is not straightforward. 
A considerable amount of work has been done in constructing likelihood ratio 
based test statistics for comparing parametric versus nonparametric covariate 
effects. Depending on how the nonparametric alternatives were specified and 
what types of smoothing techniques were used, a number of versions of likelihood 
ratio based testing procedures have been proposed. In this section, we review 
the LRTs based on the mixed model representation of a nonparametric function 
estimated using a (penalized) smoothing spline. 

Crainiceanu and Ruppert [7] considered the exact LRT and restricted like- 
lihood ratio test (RLRT) for testing whether the nonparametric function is a 
certain degree polynomial in the following partially linear model, which is a 
special case of SAMMs (2.2) and generalized partially linear models (3.1), 



Vi = S I S + /(*») + e i, 



(3.4) 
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where 5 and f(t) have the same definitions as before, ti are i.i.d. from N(0, a 2 ) 
and are assumed to be independent of Si and U. The nonparametric function 
f(t) can be approximated through a penalized smoothing spline by the following 
spline function 

K 

/(*) = 7o + lit + • • • + ld t d + Mt - &)+, (3-5) 

fc=i 

where X is a non-negative integer, 7 = (70, ■ • -, 7d) T , a = (aii • • ■, o-k) T are two 
sets of parameters, (t)i = t d for t > and zero otherwise, £1 < ■ ■ ■ < £k are 
fixed knots, and could be defined as the k/(K + l)th sample quantile of t's. In 
order for (3.5) to be a good approximation, K is usually chosen to be large (such 
as 20), in which case it is not desirable to estimate 7 and a directly. A penalized 
spline estimate of f(t) is obtained by minimizing the following penalized least 
square equation 

ra 1 

Y,{Vi-m)-si6} a + ^a T H- 1 a, (3.6) 

i—l 

where A is the smoothing parameter and £ is a pre-specified roughness penalty 
matrix, usually taken to be the identity matrix £ = Ikxk- 

Let A be the m x (d + 1) matrix with the zth row A, = (1, ti, ■ ■ ■, tf) and B be 
the mxK matrix with the ith row Bi = [(ti — ^1)+, (t%— The penalized 
least square equation (3.6) suggests that f(t) has a mixed effects representation 
/ = Aj + Ba, where / = {/(<i), ffa), ■ ■ ■ , f(t m )} T , 7 is considered as fixed 
effects and a is regarded as random effects having the distribution a ~ N(0, a^) 
with a\ = \a\. Denote (3 = (S T ,j T ) T and X = [S\A] where S is the m x p 
matrix with the ith row sf. Then the original partially linear model has the 
equivalent linear mixed model representation 

Y = X(3 + Ba + e. (3.7) 

It can be clearly seen from the penalized spline expression (3.5) that generally 
f(t) is a polynomial of degree d—h (h = 0, 1, . . . , d) if jd-h+i = • • ■ = Id = and 
01 = • • • = = 0, which is equivalent to jd-h+i — ■ ■ • = 7d = and a 2 a = (or 
A = 0) using the linear mixed model representation. Therefore, testing whether 
the covariate effect of t is a (d — /i)-degree polynomial is equivalent to testing 
H ■ Jd-h+i = ■ ■ ■ = Id = 0, a\ = (A = 0) versus H a : ~f d -h+i ^ or • • • 
° r Id 7^ or a\ > (A > 0) if the mixed model representation of a penalized 
smoothing spline is used. One approach proposed by Crainiceanu and Ruppert 
[ : ] for testing this hypothesis is the LRT using the log-likelihood of f3, a\ and 
of from the mixed model representation (3.7) 

Ufl, al of ; 30 - -\ log \V\ - \(Y - XpfV-^Y - X/3), 

where V = a^BB T + (J^I m xm is the marginal variance of Y under the model 
(3.7). In the case where h = 0, the testing problem becomes a variance compo- 
nent test, i.e. Hq : a 2 a = versus H a : a 2 a > 0. Besides the LRT, an alternative 
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choice for testing this particular hypothesis is to use the following REML func- 
tion 

i R (o*, a 2 ; Y) = -± log \V\ - \ log \X T V- 1 X\ - ±(Y - XP) T V~\Y - X% 

where /3 = (X T V- 1 X)- 1 X T V- 1 Y. This method is abbreviated by RLRT. 

As pointed out by Crainiceanu and Ruppert [7] , under H the LRT or RLRT 
asymptotically does not follow a 0.5xq + 0.5xi mixture chi-squarc distribution 
as suggested by Self and Liang [23] and Stram and Lee [25]. Instead, the LRT 
or RLRT asymptotically follows a mixture of Xo an d Xi with a much heavier 
mass on Xo ■ A simple and fast algorithm was also proposed to sample the exact 
null distribution of the LRT or RLRT, which is summarized as follows [7]: 

Step 1: Generate a grid of A values where = Ai < A2 < • • • < A„. 

Step 2: Simulate K independent random variables wf, ■ ■ -,w\ from the 

X?. Let = 

Step 3: Independently simulate X mi K,d = J2T=k+i w s with w 2 ~ Xi- 
Step 4: When h 7^ 0, independently simulate X^ = £ s=1 u 2 with u 2 <~ Xi- 
Step 5: For every grid point Aj calculate 

k . 

N m (Xi) = 2^ —r w s 

K ^2 
D m {\i) = ^ " h X mt K,d- 

Step 6: Obtain \ max that maximizes f m (Xi) over Ai, • • •, A n , where 
/ m (A) = mlogjl + - £ Ml + Ks, m ). 

Step 7: Compute the LRT statistic LRT m = f m (Xmax)+nT' log(l+ 
or LRT m = fm(^max) if /i = 0. For the case of RLRT, compute 



A',, 



Sk+X„ 



RLRT m = sup 

A>0 



(m-p-d-1) log |l + - J2 + V*, m ) 



Step 8: Repeat steps 2-7. 

Here \x s . m and Cs,m arc defined to be the K eigenvalues of the K x K matrices 
Z T P Q Z and Z T Z respectively, where P = I m - X{X T X)~ 1 X T . 

In a recent (unpublished) paper, Claeskens et al. [5] adapted the idea of 
Crainiceanu and Ruppert [7] and explored the advantages of wavelets for es- 
timating nonparametric smooth functions over the use of penalized splines in 
partially linear models for independent data. Two asymptotic distribution the- 
orems were developed for the test statistics proposed therein, and simulation 
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results showed that the wavelet-based test has better performance than the pe- 
nalized spline based test in some situations. They also extended the wavelet 
based test to the cases of simultaneously testing several polynomial covariate 
effects. 

For testing generalized linear models with a single covariate t for independent 
discrete data, Liu et al. [20] proposed three methods which are "based on the 
connection between smoothing spline models and Baycsian models" , assuming 
f(t) in model (3.1) to have the following Bayesian expression 

/(*) = 7o + 7i* + ' ' ' + ldt d + T X / 2 W(t), 

where 70 j 7i> • • • j Id have flat prior, and W(t) is the <i-order Wiener process. 
Under this Baycsian model, they extended the generalized maximum likelihood 
ratio (GML) test of Wahba [27] to test the adequacy of a generalized linear 
model, which is equivalent to Hq : r = 0. The test statistic of the GML test 
proposed by Liu et al. [20] is constructed as 

supr^Lir, <p\y) 

where L(r,(j)\y) denotes the marginal density of y under this Bayesian model. 
Obviously, under the mixed model representation of a smoothing spline estimate 
of a nonparametric function tcML is essentially a LRT. 

One difficulty with the GML test is that there is no closed form expression 
for L(t, 4>\y)i and the test statistic can only be approximated numerically [20]. 
Secondly, it is nearly impossible to analytically derive the null distribution of 
the test statistic as its distribution depends on some unknown parameters. To 
overcome this difficulty, Liu et al. [20] suggested two approaches to approximat- 
ing the exact null distribution of the test statistic. One is the usual bootstrap 
procedure which is computationally intensive. The other approach is the so 
called empirical approximation method, which was considered superior to the 
bootstrap-based method. 

It should be noted that the testing procedures based on the likelihood ra- 
tio are all proposed for models for independent data. Although conceptually 
they can be extended to SAMMs for longitudinal/clustered data, there are at 
least two major obstacles. First the calculation of the likelihood is even more 
complicated under the alternative using the mixed model representation of a 
(penalized) smoothing spline estimate of a nonparametric function. Secondly, it 
may not be easy to extend the algorithm of Crainiceanu and Ruppert [7] , orig- 
inally proposed for simulating the exact distribution of the LRT in a partially 
linear model, to SAMMs or even LMMs for longitudinal/clustered data. More 
future research is needed in this area. 



3.3. Score tests 



In generalized linear models, score tests have been used for testing the overdis- 
persion and heterogeneity of outcomes [10; 24]. Lin [19] extended score tests to 
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GLMMs, in which a global score test as well as individual score tests were pro- 
posed to test the null hypotheses of all zero random-effect variance components 
and individual zero random-effect variance components respectively. 

Zhang and Lin [31 1] considered the problem of testing the nonparametric func- 
tion f(t) in model (2.2) being a d-order polynomial. They first estimated f(t) by 
a d-order smoothing spline and expressed / with a mixed effects representation, 
similar to the one in Section 3.2 for a penalized smoothing spline 

f = T 1 + Ta, (3.9) 

where / = f(t°), t° is the vector formed by distinct {%}'s, T is a matrix formed 
by zero to the dth polynomials of t° with corresponding coefficients 7, T is a 
smoothing matrix, and a ~ N(0, tI). Note that this mixed effects representation 
is basically the same as the Baycsian expression presented in Section 3.2. 

Denote by N the incidence matrix mapping t° to {£y}'s, and define X = 
(NT,S), B = NT. Then under the mixed effects representation (3.9), SAMM 
(2.2) becomes the following GLMM 

g(n) = XP + Ba + Zb, (3.10) 

where j3 = ("f T ,5 T ) T are the new fixed effects and (a, b) are the new random 
effects. 

As described in the earlier sections, testing f(t) in SAMM (2.2) being a d- 
order polynomial is equivalent to testing Ho : r = in the induced GLMM 
(3.10). Zhang and Lin [ I] adapted the idea of Lin's [19] variance component 
score tests to test Hq : r = 0. However, they pointed out that the score tests 
proposed by Lin [19] for testing zero variance components in GLMMs cannot 
be used directly for testing Hq : r = 0. They proposed a scaled chi-squarcd 
approximation to the test statistic. 

Denote by -0 = (Q T ,<p) the nuisance parameter vector, and by £m(t,iP) the 
marginal log-likelihood function of r and ip (by integrating out random effects 
a, b and fixed effects 0), Then under the induced GLMM (3.10), the score U T 
for testing Hq : r = takes the following form 



d£ M (T,ip;y) 



Or 



(3.11) 



T=O,-0 



-{(Y - Xj3) T V- l NTN T V- l {Y - Xf3) - tr(PNTN T )} 



where (3 is the MLE of (3 and t/j is the REML-type of estimate of tp under the 
following null GLMM (3.12), and Y = X(3 + Zb + A(y-fjt) is the working vector 
from the null GLMM 

g(f,)=X(3 + Zb, (3.12) 

where P = V' 1 - V~ 1 X(X T V~ 1 X)~ 1 X T , V = W' 1 + ZGZ T , G = diag{A 
...,£>}, A = di&gig'faij)}, W = diag{w y } and w %3 = {^^(MijOtfl'CMij)] 2 } -1 - 
Note that model (3.12) is the matrix representation of the original GLMM (2.1). 
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Because of the special structure of S, Zhang and Lin [30] found that the 
score W T (V>) does not follow an asymptotic normal distribution. Write U T (ip) 
as U T (ip) = U T (y;i/j) — e(ip), where U r (y;ip) and e(ip) denote the first and the 
second terms of the above score, and define ipo as the true value of ip under 
Hq : t = 0. Zhang and Lin [30] showed that the null distribution of U T (y;ipo) 
is approximately equal to the one of weighted chi-squared random variables 
and can be well approximated by a scaled chi-squared distribution. Since the 
expectation of U T (ip) is an increasing function of r, larger values of U T (ip) give 
more evidence against Hq, which indicates that the score test should be one- 
sided. 

Compared with the LRTs, one major advantage of using the score test statis- 
tic U T {y 1 ip) is its easy implementation, as it can be calculated directly by fit- 
ting a GLMM (under the null hypothesis) rather than a SAMM. In addition, 
the critical values can be directly approximated from the regular chi-square dis- 
tribution. Therefore, it is not necessary to derive the distribution of the test 
statistics under the null hypothesis as often required by the LRTs. Secondly, as 
SAMMs encompass a broad class of statistical models, the above score test can 
be applied in many situations, such as independent Gaussian data [6], clustered 
Gaussian or binary data, etc. For clustered data, the implementation of the 
LRTs can be very difficult as expensive computation is needed to approximate 
the null distribution of the test statistics. 

The simulation results showed that the score test statistic above performs 
very well for Gaussian outcomes, less so for binary data due to the poor approx- 
imation of the Laplace method in calculating the score statistic, but improves 
rapidly as the binomial denominator increases [30]. 

3-4- Residual based tests 

Inspired by the idea of residual plots for checking the goodness-of-fit of regres- 
sion models, recently Pan and Lin [22] introduced a graphical and numerical 
approach to assess the adequacy of GLMMs. These methods are "based on the 
cumulative sums of residuals over covariates or predicted values of the response 
variable" [22] and are the further extensions of the work by Su and Wei [2G] and 
Lin et al. [18]. 

Denote by /Uy(/3, 9, (f>) = E(yij), the marginal mean of yij and define residual 
eij as e,j = j/y — jujj, where /Zy = /i y ■(/?, 9, (j>), and /3, 9, 4> are the estimates of 
the corresponding parameters under the original GLMM (2.1) or model (3.12) in 
the matrix notation. Pan and Lin [22] then considered the following two classes 
of stochastic processes 

rn ni 

W(x) = m - i/2 tfai - x ) e ^' 

»=i i=i 

m rii 

W g (r)=m- 1 / 2 J2Y f I('Pi3<r)e ij , 
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where x = (xi, ■ ■ ■, x p ) T , r G 72., /(a;^ < x) = I(xuj %pij < and 

Xfcy is the fcth component of Xij . 

Under the assumed GLMM, these stochastic processes converge in distribu- 
tion to zero-mean Gaussian processes, which can be simulated through Monte 
Carlo techniques. Each observed cumulative-sum process W(x) or W g (r) can 
then be compared, both visually and analytically, to a certain zero-mean Gaus- 
sian process. If the assumed GLMM is a reasonable model for the given data, the 
cumulative-sum processes would behave like white noise. Therefore, any abnor- 
mal departure of W(x) or W g (r) from the zero-mean Gaussian processes would 
be an indication of model mis-specification. The main advantage of this testing 
approach is that there is no need to specify the alternatives, therefore it can be 
used to test whether or not f(t) in SAMM (2.2) can be adequately represented 
by a polynomial function. Nevertheless this test may be less powerful compared 
to the other procedures specifically designed for testing f(t). 

Introduced by Fan and Huang [11], another residual based test is the so 
called "adaptive Neyman test". Although the test statistic is constructed in a 
completely different way, the basic idea is similar to the one described above, 
i.e. if a parametric model fits data well, the residuals should fluctuate around 
0. They focused on the classical nonparamctric model, which is y = f(x) + e 
with e ~ N(0,a 2 ). Under the null hypothesis /(•) = m(-,7) for some 7, where 
m(-, 7) belongs to a given parametric family, the resulting residuals are given as 

= Hi ~ m { x iil)i i — lj ' ' '7 n i where 7 is the estimate of 7 under the assumed 
model. Denote <T = (ei, . . . ,e n )i then t is nearly independently and normally 
distributed with mean vector r\ = (rji, ■ ■ •, rj n ) T , where rji = f(xi) — m(xi, 70) and 
70 is the convergent limit of 7. Thus, the testing problem can be constructed as 
H : r) = versus H a : 77 7^ 0. Fan and Huang [11] adopted the adaptive Neyman 
test to this testing problem. The adaptive Neyman test statistic is constructed 
based on the Fourier transform of the residuals <T with its exact null distribution 
being generated through simulations. 

As mentioned earlier, the adaptive Neyman test has only been studied in 
partially linear models. So, extending it to LMMs or GLMMs could potentially 
be a future research direction. 

4. Comparison between the exact likelihood ratio and the score tests 

In this paper, we provided an overview of the four types of testing approaches. 
Among them, likelihood ratio and score tests have been widely used in a vari- 
ety of hypothesis testing problems. To our knowledge, however, no comparison 
between these two tests has been investigated for the current situation, i.e. test- 
ing a parametric covariate effect against a nonparametric covariate effect. Here, 
we conduct a small simulation study to evaluate and compare the performance 
of these two popular testing procedures. For illustration purposes, we consider 
testing the linearity of covariate effects under the partially linear model frame- 
work, i.e. whether f(t) is a linear function of t in model (3.4). Following the 
penalized spline, we formulate the exact LRT (named as LRT1), RLRT and the 
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score test as variance component tests based on the mixed model representation 
(3.7) as discussed above. In additon, for testing the same null hypothesis, we 
also formulate the exact LRT in a different way (named as LRT2) by modeling 
the alternative through a quadratic spline. In the latter case, we are testing 
whether f(t) is a (d — h)-degree polynomial of t with d = 2 and h = 1. 

Since no exact LRT or RLRT has been developed for mixed models for longi- 
tudinal/clustered data, we only consider partially linear models for independent 
data even though Zhang and Lin's [30] procedure is applicable to more compli- 
cated models. 

Data in this simulation arc generated from the following partially linear model 
in = Siifh + Si2p2 + f(U) + e*, i = 1, 2, . . . , m 

where sn is generated from N(0,0.3), is generated from N(0,0.4), V s are 
equally spaced distinct points in [0,1], and e.; ~ N(0, a 2 ). The true values of /3i 
and /? 2 are set to be 1.3 and 0.45 respectively. The values of a are 0.25 and 
0.5, and the sample size m is taken to be 50 and 100. A total of five different 
functions of f(t) are considered, i.e., f c (t) = (0.25c)f ■ exp{2 — 2t) — t + 0.5, 
for c = (0, 1,2,3,4) [30]. Note that when c = 0, f c (t) is a linear function of t 
and f c (t) deviates further from linearity with increasing c. We apply the exact 
LRT1, LRT2, RLRT and the score testing procedures to each simulated data 
set. The simulation results are based on 1000 Monte Carlo simulation runs. 

For testing the null hypothesis that f(t) is a linear function of t, the size 
and power of each testing procedure are calculated by setting c = and c ^ 
respectively. When a penalized spline is used to estimate f(t) as in the LRT 
or RLRT, the number of knots for the penalized spline is set to be 20. For the 
score testing procedure, the smoothing matrix X is from a natural smoothing 
spline. 

The simulation results arc presented in the Table 1 (m = 50) and Table 2 
(m = 100), where the nominal levels are set to be 0.05 and 0.1. Regarding the 
empirical size, our simulation results show that the exact LRT2, RLRT and the 
score test are all close to the nominal levels. The empirical size of the LRT1, 
however, stays unchanged even if the nominal level increases from 0.05 to 0.1. 
Overall the increased sample size brings the empirical sizes of all these tests 
closer to the nominal levels, whereas the error noise seems to have not much 
influence on them. With respect to the power, all tests show decreased power 
as the error variance increases. As expected, the increased sample size improves 
the overall power. Note that the powers of the LRT1 are also unchanged as the 
nominal level increases, which implies that the simulated critical values for the 
LRT1 may not be accurate with a moderate number of Monte Carlo simulation 
runs. In general, our simulation indicates that the LRT2, RLRT and score test 
are more powerful than the LRT1, with the score test slightly out-performing 
the exact LRT2 and RLRT. 

In comparing to likelihood ratio based tests, the score test has at least two 
main advantages. First the exact LRT (LRT1 and LRT2) and RLRT are com- 
putationally much more intensive than the score test, as deriving the null dis- 
tributions of the LRT and RLRT statistics requires simulation in each run. The 
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Table 1 

Empirical sizes and powers of the four tests in testing the linearity of covariate effects in 

model (3.4) where m = 50 



nominal 


u 


Test 


Size 




Power 




level 






c = 


c = 1 


c = 2 


c = 3 


c = 4 


0.05 


0.25 


LRT1 


0.032 


0.152 


0.696 


0.991 


1.000 






LRT2 


0.049 


0.419 


0.935 


0.999 


1.000 






RLRT 


0.067 


0.419 


0.927 


1.000 


1.000 






Score 


0.066 


0.443 


0.948 


1.000 


1.000 




0.5 


LRT1 


0.066 


0.094 


0.224 


0.473 


0.782 






LRT2 


0.047 


0.135 


0.412 


0.737 


0.923 






RLRT 


0.050 


0.123 


0.404 


0.720 


0.915 






Score 


0.060 


0.158 


0.448 


0.762 


0.936 


0.1 


0.25 


LRT1 


0.032 


0.152 


0.696 


0.991 


1.000 






LRT2 


0.115 


0.548 


0.962 


0.999 


1.000 






RLRT 


0.138 


0.545 


0.970 


0.999 


1.000 






Score 


0.124 


0.560 


0.972 


1.000 


1.000 




0.5 


LRT1 


0.066 


0.094 


0.224 


0.473 


0.782 






LRT2 


0.093 


0.230 


0.545 


0.838 


0.961 






RLRT 


0.103 


0.213 


0.531 


0.832 


0.960 






Score 


0.104 


0.242 


0.565 


0.859 


0.970 



Table 2 

Empirical sizes and powers of the four tests in testing the linearity of covariate effects in 

model (3.4) where m = 100 



nominal 


(7 


Test 


Size 




Power 




level 






c = 


c = 1 


c = 2 


c = 3 


c = 4 


0.05 


0.25 


LRT1 


0.044 


0.217 


0.950 


1.000 


1.000 






LRT2 


0.053 


0.675 


0.994 


1.000 


1.000 






RLRT 


0.052 


0.661 


0.995 


1.000 


1.000 






Score 


0.052 


0.691 


0.997 


1.000 


1.000 




0.5 


LRT1 


0.068 


0.115 


0.364 


0.810 


0.988 






LRT2 


0.059 


0.240 


0.681 


0.956 


0.999 






RLRT 


0.054 


0.221 


0.670 


0.959 


0.999 






Score 


0.062 


0.249 


0.697 


0.963 


0.999 


0.1 


0.25 


LRT1 


0.044 


0.217 


0.950 


1.000 


1.000 






LRT2 


0.109 


0.778 


0.998 


1.000 


1.000 






RLRT 


0.102 


0.762 


0.999 


1.000 


1.000 






Score 


0.107 


0.779 


1.000 


1.000 


1.000 




0.5 


LRT1 


0.068 


0.115 


0.364 


0.810 


0.988 






LRT2 


0.103 


0.353 


0.781 


0.975 


1.000 






RLRT 


0.112 


0.336 


0.777 


0.982 


1.000 






Score 


0.111 


0.363 


0.798 


0.983 


1.000 



computing time of the exact LRT and RLRT in this simulation is 50 times more 
than that of the score test. Secondly, the exact LRT and RLRT have not yet been 
developed for more complicated models such as LMMs and GLMMs, whereas 
the score testing procedure is flexible and can be adapted to many modeling 
situations. For simplicity, only the linearity test is considered in the current 
simulation; however in practice, one might be interested in testing higher-order 
polynomial covariate effects (i.e. d > 1), which can be easily carried out by 
using a different d. Overall we consider the score test is a better choice than the 
LRT and RLRT. 
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5. Summary 

We overview the main development of the four types of testing approaches used 
for testing a parametric covariate effect versus a nonparametric covariate effect. 
A considerable amount of work has been done with the LRTs under linear 
or generalized linear models. The likelihood based tests perform very well for 
independent data in finite sample situations. However, these test statistics can 
be difficult to compute in a more complex model, as both the parametric and 
nonparametric models need to be estimated. 

In addition, deriving the null distributions of those test statistics can be 
challenging. Therefore, it is not straightforward to extend the existing LRTs 
or RLRTs to LMMs and GLMMs. Compared to the LRTs or RLRTs, the score 
statistics are easy to compute, usually show good performance and are applicable 
to both LMMs and GLMMs. Further study may be needed to investigate the 
properties of the score tests for small samples. The R tests are likelihood-ratio- 
based tests, hence they share the same advantages and disadvantages as the 
LRTs. The recently developed residual-based test [22] can be considered as an 
omnibus test for detecting model mis-specification and can be used to test the 
adequacy of a polynomial covariate effect. Since no alternative models need to 
be specified, the residual-based test is applicable in many situations including 
LMMs and GLMMs. However, it may be less powerful than the other testing 
procedures that are specifically designed for testing a particular covariate effect. 
Comparison of the residual-based tests with the score tests in mixed models 
could be of future interest. 

Acknowledgements 

The research of Daowen Zhang is partly supported by an NIH grant R01 CA85848- 
08. I would like to thank the referee and the managing editor Wendy Martinez 
for many valuable suggestions that greatly improved the presentation of this 
paper. 

References 

[1] Azzalini, A. and Bowman, A. (1993). On the use of nonparametric 
regression for checking linear relationships. Journal of Royal Statistical So- 
ciety - B 55, 549-557. MR1224417 

[2] Breslow, N. E. and Clayton, D. G. (1993). Approximate inference 
in generalized linear mixed models. Journal of the American Statistical 
Association 88, 9-25. 

[3] Brumback, B., Ruppert, D. and Wand, M. P. (1999). Comment on 
variable selection and function estimation in additive nonparametric re- 
gression using data-based prior' by Shively, Kohn and Wood. Journal of 
the American Statistical Association 94, 794-797. 



M. Huang and D. Zhang/Testing polynomial covariate effects 



168 



[4] Cantoni, E. and Hastie, T. (2002). Dcgrees-of-freedom tests for smooth- 
ing splines. Biometrika 89, 251-263. MR1913957 

[5] Claeskens, G., Ding, H. and Jansen, M. (2007). Lack-of- 
fit tests in semiparametric mixed models. Available on web at 
www. econ.kuleuven.be/fetew/pdf43ublicaties/KBI_O709 .pdf 

[6] Cox, D., Koh, E., Wahba, G. and Yandell, B. (1988). Testing the 
(parametric) null model hypothesis in (semiparametric) partial and gener- 
alized spline models. Annals of Statistics 21, 903-923. MR0924859 

[7] Crainiceanu, C. M. and Ruppert, D. (2004). Likelihood ratio tests 
in linear mixed models with one variance component. Journal of Royal 
Statistical Society - B 66, 165-185. MR2035765 

[8] Crainiceanu, C. M. and Ruppert, D. (2005). Exact likelihood ratio 
tests for penalized splines. Biometrika 92, 91-103. MR2158612 

[9] Crainiceanu, C. M., Ruppert, D. and Vogelsang, T. J. (2003). 
Some properties of likelihood ratio tests in linear mixed models (unpub- 
lished). 

[10] Dean, C. (1992). Testing for overdispersion in Poisson and binomial regres- 
sion models. Journal of the American Statistical Association 87, 451-457. 

[11] Fan, J. Q. and Huang, L. S. (2001). Goodness-of-fit tests for parametric 
regression models. Journal of the American Statistical Association 96, 640- 
652. MR1946431 

[12] Gu, C. (1992). Penalized likelihood regression: a Baycsian analysis. Statis- 
tica Sinica 2, 255-264. MR1152308 

[13] Hardle, W., Mammen, E. and Muller M. (1998). Testing parametric 
versus semiparametric modeling in generalized linear models. Journal of 
the American Statistical Association 93, 1461-1474. MR1666641 

[14] Harville. D. A. (1976). Extension of the Gauss-Markov theorem to in- 
clude the estimation of random effects. Annals of Statistics 4, 384-395. 
MR0398007 

[15] Hastie, T. and Tishirani, R. (1990). Generalized additive models. Chap- 
man & Hall, New York. MR1082147 

[16] Laird, N. M. and Ware, J. H. (1982). Random effects models for lon- 
gitudinal data. Biometrics 38, 963-974. 

[17] Liang, H. (2006). Checking linearity of non-parametric component in 
partially linear models with an application in systemic inflammatory re- 
sponse syndrome study. Statistical Methods in Medical Research 15, 273- 
284. MR2227449 

[18] Lin, D. Y., Wei, L. J., and Ying, Z. (2002). Model-checking techniques 
based on cumulative residuals. Biometrics 58, 1-12. MR1891037 

[19] Lin, X. (1997). Variance component testing in generalized linear models 
with random effects. Biometrika 84, 309-326. MR1467049 

[20] Liu, A., Meiring, W. and Wang, Y. (2004). Testing generalized lin- 
ear models using smoothing spline methods. Statistic Sinica 15, 235-256. 
MR2125730 

[21] Lombardia, M. J. and Sperlich, S. (2007). Semiparametric inference in 
generalized mixed effects models, http : //ssrn. com/abstract=1010928. 



M. Huang and D. Zhang/Testing polynomial covariate effects 



169 



[22] Pan, Z. and Lin, D. Y. (2005). Goodness-of-fit methods for generalized 
linear mixed models. Biometrics 61, 1000-1009. MR2216193 

[23] Self, S. G. and Liang, K. Y. (1987). Asymptotic properties of max- 
imum likelihood estimates and likelihood ratio tests under non-standard 
conditions. Journal of the American Statistical Association 82, 605-610. 
MR0898365 

[24] Smith, P. J. and Heitjan, D. F. (1993). Testing and adjusting for de- 
partures from nominal dispersion in generalized linear models. Applied. 
Statistics 41, 31-41. 

[25] Stram, D. O. and Lee, J. W. (1994). Variance components testing in 
the longitudinal mixed effects model. Biometrics 50, 1171-1177. 

[26] Su, J. Q. and Wei, L. J. (1991). A lack-of-fit test for the mean function in 
a generalized linear model. Journal of the American Statistical Association 
86, 420-426. MR1 137124 

[27] Wahba, G. (1990). Spline models for observational data. CBMS-NSF re- 
gional conference series in applied mathematics, SIAM. MR1045442 

[28] Zeger, S. L. and Karim, M. R. (1991). Generalized linear models with 
random effects: A Gibbs sampling approach Journal of the American Sta- 
tistical Association 86, 79-86. MR1137101 

[29] Zhang, D. and Lin, X. (1998). Scmiparametric stochastic mixed models 
for longitudinal data. Journal of the American Statistical Association 93, 
710-719. MR1631369 

[30] Zhang, D. and Lin, X. (2003). Hypothesis testing in semiparametric 
additive mixed models. Biostatistics 4, 57-74. 

[31] Zhang, D. (2004). Generalized linear mixed models with varying coeffi- 
cients for longitudinal data. Biometrics 60, 8-15. MR2043613 



